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Abstract 

Content-based video segmentation and classification is a key to the success of future multimedia databases. 
Research in this area in the past several years has focused on the use of speech recognition and image 
analysis techniques. As a complimentary effort to prior research, we have focused on the use of motion and 
audio characteristics. Fundamental to both segmentation and classification tasks is the characterization by 
certain features of a given video segment. In this paper, we describe several audio and motion features that 
have been found to be effective in distinguishing motion and audio characteristics of different types of scene 
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Abstract 

A hierarchical system for audio classification and retrieval based on audio content analysis is presented in th 
paper. The system consists of three stages. The first stage is called the coarse-level audio classification and 
segmentation, where audio recordings are classified and segmented into speech, music, several types of 
environmental sounds, and silence, based on morphological and statistical analysis of temporal curves of 
short-time features of audio signals. In the second stage, environmental sounds are further classified into fint 
classes such as applause, rain, bird sound, etc. This fine-level classification is based on time-frequency 
analysis of audio signals and use of the hidden Markov model (HMM) for classification. In the third stage, the 
query-by-example audio retrieval is implemented where similar sounds can be found according to an input 
sample audio. It is shown that the proposed system has achieved an accuracy higher than 90% for coarse- 
level audio classification. Examples of audio fine classification and audio retrieval are also provided 
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Abstract 

Determining automatically what constitutes a scene in a video is a challenging task, particularly since there i 
no precise definition of the term "scene". It is left to the individual to set attributes shared by consecutive shoi 
which group them into scenes. Certain basic attributes such as dialogs, like settings and continuing sounds z 
consistent indicators. We have therefore developed a scheme for identifying scenes by clustering shots 
according to detected dialogs, like settings and similar audio. Results from experiments show automatic 
identification of these types of scenes to be reliable 
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Abstract 

Many audio and multimedia applications would benefit from the ability to classify and search for audio baset 
on its characteristics. The audio analysis, search, and classification engine described here reduces sounds I 
perceptual and acoustical features. This lets users search or retrieve sounds by any one feature or a 
combination of them, by specifying previously learned classes based on these features, or by selecting or 
entering reference sounds and asking the engine to retrieve similar or dissimilar sounds 
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Abstract 

We study an important problem in multimedia database, namely the automatic extraction of indexing 
information from raw data based on video contents. The goal of our research project is to develop a prototyp 
system for automatic indexing of sports videos. The novelty of our work is that we propose to integrate speec 
understanding and image analysis algorithms for extracting information. The main thrust of this work comes 
from the observation that in news or sports video indexing, usually speech analysis is more efficient in 
detecting events than image analysis. Therefore, in our system, the audio processing modules are first applic 
to locate candidates in the whole data. This information is passed to the video processing modules, which 
further analyze the video. The final products of video analysis are in the form of pointers to the locations of 
interesting events in a video. Our algorithms have been tested extensively with real TV programs, and result 
are presented and discussed 
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